A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application
نویسندگان
چکیده
This work compares the accuracy of fundamental frequency and formant frequency estimation methods and maximum a posteriori (MAP) prediction from MFCC vectors with hand-corrected references. Five fundamental frequency estimation methods are compared to fundamental frequency prediction from MFCC vectors in both clean and noisy speech. Similarly, three formant frequency estimation and prediction methods are compared. An analysis of estimation and prediction accuracy shows that prediction from MFCCs provides the most accurate voicing classification across clean and noisy speech. On clean speech, fundamental frequency estimation outperforms prediction from MFCCs, but as noise increases the performance of prediction is significantly more robust than estimation. Formant frequency prediction is found to be more accurate than estimation in both clean and noisy speech. A subjective analysis of the estimation and prediction methods is also made by reconstructing speech from the acoustic features.
منابع مشابه
Formants Estimation Techniques for Speech Analysis
Measuring formant frequencies in speech signals is indispensable for the search and technically problematic. Accurate measurement of formant frequencies is important in many studies of speech perception and production. Unfortunately, there is no totally effective method to allow good valuations of these frequencies. This paper presents a comparative study of two techniques of speech parameteriz...
متن کاملImpact of Novel Incorporation of CT-based Segment Mapping into a Conjugated Gradient Algorithm on Bone SPECT Imaging: Fundamental Characteristics of a Context-specific Reconstruction Method
Objective(s): The latest single-photon emission computed tomography (SPECT)/computed tomography (CT) reconstruction system, referred to as xSPECT Bone™, is a context-specific reconstruction system utilizing tissue segmentation information from CT data, which is called a zone map. The aim of this study was to evaluate theeffects of zone-map enhancement incorporated into the ordered-subset conjug...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملبررسی ساختار سازهای واکههای زبان فارسی در بزرگسالان دوزبانه آذری فارسی
Objective: Vowels are the center of syllables while formant structures are one of the most important acoustic characteristics of speech sounds that help in their articulatory and perceptual aspects. Formants represent the shape and size of the vocal tract. There exist trivial differences between the vocal tracts of different people due to which the formant structures of a vowel in one person ar...
متن کاملPredicting Formant Frequencies from MFCC Vectors
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predict...
متن کامل